32 research outputs found
LaundroGraph: Self-Supervised Graph Representation Learning for Anti-Money Laundering
Anti-money laundering (AML) regulations mandate financial institutions to
deploy AML systems based on a set of rules that, when triggered, form the basis
of a suspicious alert to be assessed by human analysts. Reviewing these cases
is a cumbersome and complex task that requires analysts to navigate a large
network of financial interactions to validate suspicious movements.
Furthermore, these systems have very high false positive rates (estimated to be
over 95\%). The scarcity of labels hinders the use of alternative systems based
on supervised learning, reducing their applicability in real-world
applications.
In this work we present LaundroGraph, a novel self-supervised graph
representation learning approach to encode banking customers and financial
transactions into meaningful representations. These representations are used to
provide insights to assist the AML reviewing process, such as identifying
anomalous movements for a given customer. LaundroGraph represents the
underlying network of financial interactions as a customer-transaction
bipartite graph and trains a graph neural network on a fully self-supervised
link prediction task. We empirically demonstrate that our approach outperforms
other strong baselines on self-supervised link prediction using a real-world
dataset, improving the best non-graph baseline by p.p. of AUC. The goal is
to increase the efficiency of the reviewing process by supplying these
AI-powered insights to the analysts upon review. To the best of our knowledge,
this is the first fully self-supervised system within the context of AML
detection.Comment: Accepted at ACM International Conference on AI in Finance 2022
(ICAIF'22
Transportation in Social Media: an automatic classifier for travel-related tweets
In the last years researchers in the field of intelligent transportation
systems have made several efforts to extract valuable information from social
media streams. However, collecting domain-specific data from any social media
is a challenging task demanding appropriate and robust classification methods.
In this work we focus on exploring geo-located tweets in order to create a
travel-related tweet classifier using a combination of bag-of-words and word
embeddings. The resulting classification makes possible the identification of
interesting spatio-temporal relations in S\~ao Paulo and Rio de Janeiro
Characterizing Geo-located Tweets in Brazilian Megacities
This work presents a framework for collecting, processing and mining
geo-located tweets in order to extract meaningful and actionable knowledge in
the context of smart cities. We collected and characterized more than 9M tweets
from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We
performed topic modeling using the Latent Dirichlet Allocation model to produce
an unsupervised distribution of semantic topics over the stream of geo-located
tweets as well as a distribution of words over those topics. We manually
labeled and aggregated similar topics obtaining a total of 29 different topics
across both cities. Results showed similarities in the majority of topics for
both cities, reflecting similar interests and concerns among the population of
Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more
predominant in one of the cities